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Abstract 

We study the joint asymptotie behavior of the spaee requirement and the total path 
length (either summing over all root-key distanees or over all root-node distanees) in ran¬ 
dom m-ary seareh trees. The eovarianee turns out to exhibit a ehange of asymptotie be¬ 
havior: it is essentially linear when 3 ^ m ^ 13 but beeomes of higher order when 
m ^ 14. Surprisingly, the eorresponding asymptotie eorrelation eoeffieient tends to zero 
when 3 ^ m ^ 26 but is periodieally oseillating for larger m, and we also prove asymp¬ 
totie independenee when 3 ^ m ^ 26. Sueh a less antieipated phenomenon is not exeep- 
tional and we extend the results in two direetions: one for more general shape parameters, 
and the other for other elasses of random log-trees sueh as fringe-balaneed binary seareh 
trees and quadtrees. The methods of proof eombine asymptotie transfer for the underlying 
reeurrenee relations with the eontraetion method. 
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1 Introduction 


The m-ary search trees are a class of data structures introduced by Muntz and Uzgalis [35] 
in 1971 in computer algorithms to support efficient searching and sorting of data; see the next 
section for more details. When constructed from a random permutation of n elements, the space 
requirement (total number of nodes to store the input) S'„ of such random m-ary search trees 
(m ^ 3) is known to exhibit a phase change phenomenon-, its distribution is asymptotically 
Gaussian for large n when the branching factor m satisfies 3 ^ m ^ 26 but does not approach 
a limit law when m ^ 27; see [8, 22, 30, 31] and the references therein. On the other hand, 
it is also known that the total key path length Kn (the sum over all distances from the root to 
any key) does not change its limiting behavior when m varies, and tends asymptotically, after 
properly centered and normalized, to a limit law for each m ^ 3. Another closely related shape 
measure, the total node path length Nn (summing over all distances from the root to any node) 
also follows asymptotically a very similar behavior. 

Our motivating question was “how does Kn or Nn depend on S'„?” Surprisingly, despite 
the strong dependence of the definition of Nn on Sn (see (2)), we show that the correlation 
coefficient p{Sn,Nn) satisfies 



( 1 ) 


where Fp{t) is a 27r-periodic function and (3 = (3m ts a structural constant depending on m. 
The same type of results also holds for p(S'„, Kn). In words, Nn and Sn are asymptotically 
uncorrelated for 3 ^ m ^ 26 and their correlation fluctuates (between —1 and l)form ^ 27; 
see Figure 1 for an illustration. 



Figure 1: The periodic functions Fp(27it) for m = 27,..., 100 (left) and Fp((3 logn) for m = 
27, 54,..., 270 (right). 

One reason why the above result (1) may seem less or even counter-intuitive is because 
of the seemingly strong dependence of Nn on Sn in the recursive equations satisfied by both 
random variables 



( 2 ) 


where the (S'j-^\ are independent copies of {Si,Ni), respectively, also independent of 

(Ji,... ,/m), and 



( 3 ) 
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when ii,..., ^ 0 and ii + ■ ■ ■ + im = n — m + 1. Intuitively, we expeet, from the above 

relations, that the node path length Nn would have a strong eorrelation with Sn- 

While one might aseribe this seemingly less intuitive result to the possibly nonlinear de- 
pendenee between and Sn, we enhanee sueh an uneorrelation by a stronger joint limit law 
fox {Sn,Nn) for 3 ^ m ^ 26, whieh further aeeents the asymptotie independenee between Nn 
and Sn', for m ^ 27, they are asymptotieally dependent and we will derive a preeise eharaeter- 
ization of their joint asymptotie distributions. See Seetion 4 for a more preeise deseription of 
the joint asymptotie behaviors of {Sn, Nn) and {Sn, Kn). 

Let a denote the real part of the seeond largest zero (in real parts) of the indieial equation 
A(z) = 0, where 


A( 2 :) = z{z + 1) ■ ■ ■ {z + m — 2) — m\. (4) 

Then a < 1 for m < 14 and 1 < a < | for 14 ^ m ^ 26; see Table 1. Also a —)■ 2 as 
m —)■ cx); see [30, See. 3.3] for more properties of a. The main reason that p{Sn, Nn) —)■ 0 for 


m 

3 

4 

5 

6 

7 

8 

9 

10 

a 

-3 

-2.5 

-1.5 

-0.768 

-0.260 

0.101 

0.366 

0.568 

m 

11 

12 

13 

14 

15 

16 

17 

18 

a 

0.726 

0.852 

0.955 

1.040 

1.112 

1.173 

1.226 

1.272 

m 

19 

20 

21 

22 

23 

24 

25 

26 

a 

1.313 

1.348 

1.380 

1.409 

1.435 

1.458 

1.479 

1.499 


Table 1: Approximate numerical values ofa = am for 3 ^ m ^ 26. 


3 ^ m ^ 26 is roughly that their eovarianee is of order max{nlogn,?7,"} (see Theorem 2.3 
below), while the standard deviations for Sn and Nn are of orders y/n and n, respeetively. So 


that 


p{Sn,Nn) 


O (n ^ \ogn^ , if 3 ^ m ^ 13; 
O , if 14 ^ m ^ 26, 


whieh tends to zero in both oases. Briefly, the large quadratic variance of Nn is the major 
cause of the asymptotic independence between Sn and Nnfor 3 ^ m ^ 26. 

Sueh a ohange from being asymptotieally independent to being asymptotieally dependent 
under a varying struetural parameter is not an exeeption. We will extend our study to fringe- 
balanoed binary seareh trees and quadtrees; a typioal related instanee states that: the number of 
comparisons (or exchanges) used by the median-of-{2t + 1) quicksort is asymptotically inde¬ 
pendent of the number of partitioning stages when 0 ^ f ^ 58, but is asymptotically dependent 
for t ^ 59. 


2 M-ary search trees 

We briefly introduoe m-ary seareh trees in this seetion and then desoribe the random variables 
we are studying in this paper. 

An m-ary tree is either empty or oomprises of a single node oalled the root, together with an 
ordered m-tuple of subtrees, eaeh of whieh is, by definition, an m-ary tree. Given a sequenee 
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Figure 2: Three m-ary search trees for the sequence {6,2,4, 8, 7,1, 5, 3,10, 9}; m = 2 (left), 
m = 3 (middle), and m = 4 (right). 

of numbers, say {xi ,..., Xn}, we construct an m-ary search tree by the following procedure, 
m^2. Ifl^n<m, then all keys are stored in the root. If n ^ m the first m — 1 
keys are sorted and stored in the root, the remaining keys are directed to the m subtrees, each 
corresponding to one of the m intervals formed by the m — 1 sorted keys in the root node; see 
Figure 2 for an illustration (the rectangular nodes denote yet empty subtrees of full nodes). If 
the m — 1 numbers in the root are Xj^ < ■ ■ ■ < then the keys directed to the ith subtree 

all have their values lying between Xj^_^ and xj-, where Xj^ := 0 and Xj^ := n -f 1. All subtrees 
are themselves m-ary search trees by definition. For more details, see Mahmoud [30]. 

While the practical usefulness of m-ary search trees is largely overshadowed by their bal¬ 
anced counterparts such as il-trees, they have been a source of many interesting phenomena, 
which are to some extent universal. The study of m-ary search trees is thus of fundamental and 
prototypical value. Furthermore, the close connection between m-ary search trees and general¬ 
ized quicksort adds an extra dimension to the richness of diverse variations and their asymptotic 
behaviors. 

2.1 Space requirement and total path lengths 

Assume that the input sequence {xi, ..., x„} is a random permutation, where all n\ permuta¬ 
tions are equally likely. The resulting m-ary search tree constructed from the given sequence is 
then called a random m-ary search tree. The major shape parameters of particular algorithmic 
interest include the depth, the height, the space requirement, the total path length, and the pro¬ 
file; see [11, 30] for more information. We are concerned in this paper with the following three 
random variables. 

• Sn (space requirement): the total number of nodes used to store the input; the three trees 
in Figure 2 have Sio equal to 10, 6, 6, respectively. If m = 2, then S'„ = n; if m ^ 3, we 
can compute Sn recursively by Sq = 0, and 



(5) 


where the are independent copies of S'*, 1 ^ r ^ m, 0 ^ i ^ n — m -f 1, and 

independent of (Ji,..., J^) defined in (3). 


4 



• Kn (key path length, KPL): the sum of the distanee between the root and eaeh key; for the 
trees in Figure 2, Kiq = {19,11, 8}, respeetively. For m ^ 2, Kn satisfies the reeurrenee 


Kn 


d 


0 , 

itrjf + ■ ■ ■ + +n-m + l, 


if n < m, 
if n ^ m. 


( 6 ) 


(r) 

where the s are independent eopies of Ki, — m + 1, 

independent of (Ji,..., Jm). 


• Nn (node path length, NPL): the sum of the distanee between the root and eaeh node; so 
that A^io = (19, 7, 6} for the three trees in Figure 2. Obviously, Nn = Kn when m = 2. 
When m ^ 3, 


Nn = 


0 , 


+ ■ ■ ■ + iv}™) + + ■ ■ ■ + 

Jm Jl -ii 


( 1 ) 


i(m) 


if n < m, 
if n ^ m. 


(7) 


where the {N-'^\ S'|'^^)’s are independent eopies of (iVj, S'j),l^r^m, — 

m + 1, independent of (Ji,..., Im). 


While the first two random variables have been widely studied in the literature, NPL was 
only eonsidered previously in [4, 21] in eonneetion with the proeess of eutting trees. In addition 
to this, our interest was to understand the extent to whieh the asymptotie independenee for 
small m between Sn and Kn subsists when the “toll funetion” ehanges from a linear funetion 
to a funetion that is random and may depend on Sn- 


2.2 A summary of known results 

Let Hm ■= Knuth [27, §6.2.4] was the first to show that 

E{Sn) ~ where 0 := ——^——, 

2[rlm - ij 

(see also [1]). Here 0 denotes the “oeeupaney eonstant”, whieh will appear all over our analysis. 
Mahmoud and Pittel [31] improved the result and derived an identity for E(S'n), whieh implies 
in partieular that 

E{Sn) = <p{n + 1)-^ + O , 

m — 1 

where a has the same meaning as in Introduetion; see (4). They also diseovered and proved the 
surprising result for the varianee 


Y{Sn) ~ 


Csn, 

Fi{(3 log?7,)?7,^"“^. 


if 3 ^ m ^ 26; 
if m ^ 27, 


where Cs is a eonstant depending on m, Fi is a vr-periodie funetion given in (24), a + i/) 
is the seeond largest zero (in real part) with /) > 0 of the equation A( 2 ;) = 0 (see (4)), and 
2a — 2 > 1 for m ^ 27. See also [9, 25, 33] for a elosely related fragmentation model with the 
same asymptotie behavior. A eentral limit theorem for Sn was then proved for 3 ^ m ^ 26 in 
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[28, 31]; see also [30] for more details. Their approaeh is based on an induetive approximation 
argument. 

By the method of moments, two authors of this paper re-proved in [8] the eentral limit the¬ 
orem for Sn when 3 ^ m ^ 26; the same approaeh was also used to establish the nonexistenee 
of a limit law for Sn due to inherent oseillations. Moreover, the eonvergenee rates to the normal 
distribution were eharaeterized in [22] by a refined method of moments, whieh undergo further 
ehange of behaviors. 

Then several different approaehes were developed in the literature for a deeper understand¬ 
ing of the “phase ehange” at m = 26; these inelude martingale [6], renewal theory [25], urn 
models [23, 32], eontraetion method [13, 39], method of moments [22], statistieal physies 
[9, 33], ete. 

On the other hand, the KPL for general m ^ 2 was first studied by Mahmoud [29] and he 
proved 

K{Kn) = 2(/)nlogn -f cin -f o(n), 

for some explieitly eomputable eonstant ci; see (21). The varianee was eomputed in [30, §3.5] 
and satisfies := 

where Ck = 40^ " t) • (8) 


The eorresponding limit law was eharaeterized in [38] by the eontraetion method 


Kn - E{Kn) 

n 



(9) 


where K is given by the reeursive distributional equation (44); see also [4, 34] for a general 
framework. 

For NPL Nn, Broutin and Holmgren [4] proved that 


E(A^n) = 20^77, log n -f C 2 n -f o{n), 


for some eonstant C 2 (for whieh no numerieal value was provided); a series expression of C 2 
is given in [21, p. 156]. We will give an alternative proof of this result below with tools from 
[8, 14]. Our approaeh makes the eomputation of C 2 feasible (although its exaet value is not 
needed); see (27). 

It should be mentioned that there is a large literature on Kn when m = 2 beeause it is 
identieal to the eomparison eost used by quieksort. Many fine results were obtained; see, for 
example, the reeent papers [3, 12, 17, 20, 37, 41] and the referenees therein for more informa¬ 
tion. 


2.3 Covariance, correlation, dependence and phase changes 

We state in this seetion our results for the eovarianee and eorrelation between the spaee require¬ 
ment and the total path lengths (KPL and NPL). The proofs and the tools needed will be given 
in the next seetions. 

Unlike the spaee requirement Sn whose varianee ehanges its asymptotie behavior for m ^ 
27, the eovarianee Cov(S'„, Kn) ehanges its asymptotie behavior at m = 14. 
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Theorem 2.1. The covariance between Sn and Kn satisfies 


Cov{Sn, Kn) 


Cru, if 3 ^ m ^ 13; 

F2 {fi logn) n", ifm ^ 14; 


where Cr is a suitable constant and F 2 {z) is a 2n-periodic function given in (25) below. 

This result has the following eonsequenee. 

Corollary 2.2. The correlation coefficient between Sn and Kn satisfies 



where Ck > 0 is given in (8). 

See Figure 1 for two different plots for the periodie functions when m ^ 27. 

The same consideration extends easily to clarify the correlation between space requirement 
and NPL. 

Theorem 2.3. The covariance between Sn and Nn satisfies 



2(j)Csn\ogn, if 3 ^ m ^ 13; 
(j)F 2 {fi logn) n", ifm ^ 14, 


where Cg is as in Section 2.2. Moreover, the variance of Nn satisfies 

Y{Nn) ~ 

Notice the appearance of an extra logn factor when 3 ^ m ^ 13, which reflects the 
additional random effect introduced by the toll function in (7). These estimates imply the 
following consequence. 

Corollary 2.4. The correlation coefficient p{Sn, Nn) satisfies 


— 0 , 


if 3 ^ m ^ 26; 


p{Sn,Nn) 



The last relation suggests considering the correlation between Kn and Nn. 


Corollary 2.5. The random variable Kn is asymptotically linearly correlated to Nn 


P{Km Nn) —)■ 1 . 
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Indeed, we will show that 


||iV„ - - (E(iV„ - cl)Kn))\\2 = 0(n) 


which then by Slutsky’s theorem implies that 


Kn - nKn) 
n 


Nn-nNn)\ d 


n 


{K^K)- 


see (9), Section 4.3 and 4.4. 

These results will be proved by working out the asymptotics of the corresponding recur¬ 
rence relations, which all have the same form 

Qn = m TTn^aj + bn, {h ^ 771 — 1), 


where 



(0 ^ j ^ n — m + 1) 


is a probability distribution, and {bn} is a given sequence (referred to as the toll-function). 
For that asymptotic purpose, our key tools will rely on the asymptotic transfer techniques (see 
[8, 14]), which provide a direct asymptotic translation from the asymptotic behaviors of bn to 
those of On- The remaining analysis will then consist of simplifying some multiple Dirichlet’s 
integrals. 

Since Pearson’s product-moment correlation coefficient p is known to be poor in measuring 
nonlinear dependence between two random variables, we go further by considering the joint 
limit laws for (S'„, Kn) and (S'n, iV„), which exhibit a change of behavior depending on whether 
3 ^ m ^ 26 (convergent case) or m ^ 27 (periodic case): they are asymptotically independent 
in the former case but dependent in the latter. 


Theorem 2.6. Assume 3 ^ m ^ 26. Let (X„)„ G {{Kn)n, {Nn)n} and Qn = {Xn, Sn) denote 
the vector of KPL or NPL and the space requirement used by a random m-ary search tree. 
Then the convergence in distribution holds: 


Cov{Qn)-^^\Qn - E[Q.]) ^ (X, ^), 


( 10 ) 


where X' has the standard normal distribution and the limit law (X, is described in 
Lemma 4.2; moreover, X and X' are independent. 

Theorem 2.7. Assume m ^ 27. Let (Xn)n G {{Kn)n, {Nn)n} and 

_ /X„-E[X„] Sn-(pn\ 

V ^xn ’ J 

with ix = 1/or {Xn)n = {,Nn)n and tx = far (X„)„ = {Kn)n- Then we have 

£2(l/,(X,3fJ(n*^A)))^0, 


where (3 is as in Section 2.2 and (X, A) is a random vector whose distribution is specified as the 
unique fixed point solution appearing in Lemma 4.1 for the choice y = f), 9) (9 being defined 
below in (28)). 
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See Seetion 4 for a more preeise formulation. The proof is based on the contraction method 
(see [36]) where we use the above moment asymptoties as input and eombine well-known 
estimates within the minimal L 2 -metrie for the eonvergent ease (as in [40]), and those with 
estimates for the periodie ease (as in [13]). Similar proof teehniques related to periodie distri¬ 
butional behaviors are also applied in [25, Theorem 1.3(iii)] and [26, Theorem 6.10]. If one is 
only interested in the asymptotie (univariate) distribution of the NPL Nn (the ease of the KPL 
being known before), there are more direet proofs whieh we also diseuss in Seetions 4.3 and 
4.4. 

Our study of the dependenee of random variables on random m-ary seareh trees ean be 
extended in at least two direetions by the same methods used in this paper, namely, asymptotie 
transfer teehniques and the eontraetion method. 

• Extension to more general linear and n\ogn shape measures: That the asymptotie eo- 
varianee undergoes a phase ehange after m = 13 and the asymptotie eorrelation under¬ 
goes a phase ehange after m = 26 is not restrieted to the spaee requirement and KPL or 
NPL. Indeed, we ean replaee the spaee requirement by many other linear shape measures 
sueh as the number of leaves, the number of nodes of a speeified type, the number of 
oeeurrenees of a fixed pattern, ete. (see [8] for more examples), and KPL or NPL by 
other shape measures with mean of order n log n sueh as summing over the root-node or 
root-key distanee for eertain speeified nodes or patterns and weighted path length. 

• Extension to other random trees of logarithmic height: the same ehange of asymptotie 
behaviors from being independent to being dependent under a varying struetural pa¬ 
rameter also oeeurs in other elasses of random log-trees; we eontent ourselves with the 
brief diseussion of two elasses of random trees: fringe-balanced binary search trees and 
quadtrees. The behaviors will be however very different for the elasses of trees where the 
underlying distribution of the subtree sizes are dietated by a binomial distribution, whieh 
will be examined elsewhere; see a eompanion paper [18] for more information. 

This paper is organized as follows. We prove in the next seetion our results for the eo- 
varianees and the eorrelations. These results are then used to study the bivariate distributional 
asymptoties in Seetion 4 by the multivariate eontraetion method (see [36]). Finally, in See¬ 
tion 5, we diseuss the dependenee and phase ehanges in fringe-balaneed binary seareh trees 
and in quadtrees, where for the former, we study the joint behavior of the size and total path 
length, while for the latter (sinee the size is a eonstant) we eonsider the joint behavior of the 
number of leaves and total path length. Also we inelude a brief diseussion for extending the 
study and results to other shape parameters in Seetion 5. 

3 Correlation between space requirement and path lengths 

We prove in this seetion Theorems 2.1 and 2.3 for the eovarianees Cov(S'„, iF„) and Cov(S'n, Nn), 
respeetively. 

3.1 Preliminaries and recurrences 

We eolleet here the notations to be used in the proofs. Let m ^ 2 be a fixed integer. For 
n ^ m, denote by ..., Im^) the veetor of the number of keys inserted in the m 
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ordered subtrees of the root in a random m-ary seareh tree with n keys. When the dependenee 
on n is obvious, we write simply (Ji,..., Im)- Generate independently n uniform random 
variables f/i,..., on [0,1]. Store the first m — 1 elements f/i,..., Um-i in the root-node of 
the tree. Then they deeompose the unit interval [0,1] into spaeings of lengths Vi, , Vm, where 
Vj = — for j = 1,..., m with f/(o) := 0, f/(m) := 1 and U(^j) for j = 1,... ,m —1 are 

the order statisties of f/i,..., 17m- 1 - The uniform permutation model implies, that, eonditional 
on Ui, ..., Um-i, the veetor has the multinomial distribution with sueeess probabilities 
Vi, ... ,Vm, namely, we have 

(/i,...,Jm) = M{n - m + l;Vi,... ,Vm). 

In partieular, we have the eonvergenee 


k 

n 


Vr, 


( 11 ) 


for all r = 1,..., m, where the eonvergenee is in Lp for all 1 ^ p < oo. Note that we also have 
(3) for all m-tuples zi,..., im ^ 0 with ii+ ■■■-{-= n — m + 1 and all n ^ m. 

For eaeh of the subtrees, the randomness (uniformity) is preserved; more preeisely, eondi¬ 
tional on the number of keys inserted in a subtree, eaeh subtree has the same distribution as 
a random m-ary seareh tree of that number of keys in the uniform model. Moreover, eondi¬ 
tional on {Ii,..., Im), the subtrees are independent. This ean be seen by switehing baek to 
the ranks {1,..., n} of the input elements, and then by eheeking that a uniform random per¬ 
mutation yields independent permutations on the respeetive ranges. This reeursive strueture 
of the random m-ary seareh tree implies the reeursive relations for Sn, Kn and Nn given in 
(5)-(7), where the summands appearing on the right-hand sides, namely, and 

..., and Nf \ ..., have the same distributions as Sj and Kj and Nj, respee- 


tively. Furthermore, the triples ((5']^^) 


m+l 


. (Af’) 


. {Nf) 


O^j^n—m+1 




are 


independent for r = 1,..., m and independent of (Ji,..., Im)- Finally, the reeursive strueture 
of the m-ary seareh tree implies reeurrenees satisfied by their joint distributions. In partieular, 
the pair Qn := {Nn, Sn) satisfies the reeurrenee 


{Qnf = Y, [ 

l^r^m 


1 1 ■ 

0 1 . 



(n ^ m), 


( 12 ) 


(r) 

where, as in (5)-(7), the Qj ’s are distributed as Qj for all 1 ^ r ^ m and — m-l-1, 

and the are independent for r = 1,..., m and independent of (Ji,..., /„). 

The reeurrenee satisfied by the pair Zn ■= {Kn, Sn) is 


{Zn)'= Y 

l^r^m 


7{r) 


+ 


n — m + 1 
1 


{n ^ m), 


(13) 


with eonditions on independenee and identieal distributions similar to (12). 
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3.2 Asymptotic transfer and Dirichlet integrals 

Starting from the distributional recurrences (5) and (6), we see that all centered and non- 
centered moments satisfy the same recurrence of the following type 

cbji m ^ ^ n ^5 (14) 

m+l \m—l) 

for n ^ m — 1, where {6n}n^m-i is a given sequence. The asymptotics of an can be system¬ 
atically characterized by that of bn through the use of the following transfer techniques; see 
Proposition 7 in [8] and Theorem 2.4 in [14] for details. 

Proposition 3.1. Assume that an satisfies (14) with finite initial conditions oq, ..., 0 ^- 2 - Define 
bn ■= Cbnfor 0 ^ n ^ m — 2. 

(i) Assume bn = c(n + 1) + tn, where c G C. Then the conditions 


tn = o(n) and 




< (X) 


are both necessary and sufficient for 

On = 2c(f)nHn + c'n + o(n), 


where 




^ + - — 2c4> + 2c(H^^^ — 


j^O 


(j + l)(j + 2) 2 


(ii) if bn ~ cn'^, where u > 1, then 


Cbn ~ 


-1 m!r(i;+l) 

V{v+m) 


n 


In particular, when c = 0 in (z), then we see that an is asymptotically linear 


n 






(j + i)(i + 2) 


iff bn = o{n) and 


n>l 


-2 


n 


< 00. 


We will be dealing with Dirichlet integrals of the following type 


I{u,v) := 


fXlA - \-Xm = 

0^Xl,...,Xm.^ 


1 Vl^isgm / Vl^rsgm / 


(3f?(M),5R(u) > 0). 


Here dx is an abbreviation for dxi ■ ■ • dxm-i- Such integrals have a closed-form expression. 
Lemma 3.2. For m ^ 2 and 5R(m), 3?(r;) > 0, 

mr{u 4- u — 1) -f m(m — l)r(M)r(u) 


I{u,v) = 


r{u + v + m — 2) 


(15) 
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Proof. First, the claim is easily proved for m = 2. Assume m ^ 3. Then, by symmetry. 


I{u,v) = / ^ + m(m — l)a:“ dx 

m r ^u+v- 2 ^^ _ 


(m-2)! Jo 

m{m — 1) 



Xi ^X2 ^{1—Xi—X2)'^ ^dx2dxi 


(m- 3 )! Jo Jo 
mr(M + u —1) m(m — l)r(M)r(u) 


r(M + n + m — 2) r(M + r; + m — 2) 

which leads to (15). I 

The following two identities will be needed below. 


/xiH- hXm = 

0^Xl,...,XmSgl 

d 


1 ( 5^ M ( ^r-logxj 
1 ) \l^r^m J 


dx 


mV{u) 


v=2 


{uj){u + 1) + (m — 1)(1 — 7) — (m + M — l)'0(m + u)) , 

r(m + u) 

where ip is the digamma function and 7 is Euler’s constant. Similarly, 


/xiH- \-Xni = 

0^X1 ,...,Xm^ 


^ I XrlogXr j 

1 / 


dx = 


dudv 


I{u,v] 


= i/(2) + ± _ 

m I ) o 


U=V = 2 
2 


(m — l)7r^ 


^ m + 1 6(m + 1) 


3.3 Correlation between the space requirement and KPL 

We are now ready to prove Theorem 2.1. 


(16) 


(17) 


Expected values of Sn and Kn. For convenience, let := E(S'„) and Kn ■= E(iF„). Then, 
by (5) and (6), forn ^ m — 1 

Pn ^ ^ T 1) 

O^j^n—m+1 

Kn = '‘Y + n — m + 1, 

O^j^n—m+1 

with the initial conditions /iq = Kn = 0 for 0 ^ n ^ m — 2 and jj-n = ^ for 1 ^ n ^ m — 2. 
By applying Proposition 3.1(i), we obtain 

Hn ^ fn, and «„ = 20n log n + cin + o(n), (18) 
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for some constant ci whose value matters less; see (21) below. The latter approximation is suf¬ 
ficient for all our purposes, but the former is not and we need the following stronger expansion 
(see [ 8 , 31, 30]) 

Hn = (j){n+l) -^ ( 19 ) 


where X 2 = a + and A 3 := a — i[5 and 




Afc(Afc — 1 ) X] 


1 


0s£Km-2 j+Afc 


( 20 ) 


Note that for 3 ^ m ^ 13 the constant term — (together with 0) is the second-order term 
on the right-hand side of (19), while for larger m, it is absorbed in the o-term. 

On the other hand, although the explicit expression of ci is not needed in this paper, we 
provide its expression here since the known ones (see [29, 30]) are less explicit and it can be 
easily obtained from Proposition 3.1: 




( 21 ) 


Variance and covariance. To compute the asymptotics of the covariance, we first derive the 
corresponding recurrences and then apply Proposition 3.1 of asymptotic transfer. 

First, let Sn = Sn — /^n and Kn = Kn — Kn- We consider the moment-generating function 

Pn{u,v) := E . 

Then, using (5) and (6), we obtain forn ^ m — 1 

«) = tAt Z Pn (“•") ■ ■ ■ (22) 

\m-l) j 

with the initial conditions Pn{u, n) = 1 for 0 ^ n ^ m — 2. Here, j = (ji,..., jm) is a vector 
with ji,..., jm ^ 0 and ji -f ■ ■ ■ -f jm = n, — m -f 1 (we use this notation throughout), 

Aj = 1 - fin + ^ ^J‘jl , and Vj = n- m + l-Kn+ ^ i^ji ■ (23) 

1^/^m l^l^m 

Define 


K!®' = V(S„), K!®'''! = Cov(S„, A'„), l/J'"' = V(A-„). 

Then, by taking derivatives in (22), we obtain 

+'>!?'’. (A'elS.SA.A}). 


where 


5 ( 5 ] ^ ^ ^ A? ^[SK] 


y]AjVj, and C' 


( n \ / V J ’ ^ ^ / V J J ’ / n A / 

\m—lJ j \m—lJ j Vm—1/ j 

We first derive uniform asymptotic approximations for Aj and Vj. 
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Lemma 3.3. Uniformly in j, 


E 


Av 


, Afc — 1 


2s£fc^3 


r(Afc) 


n 


■ 1 + E 




Xk-V 


oin 




and 

jr , 1 I / \ 

— log — + oin). 

n n I 

Proof. This follows from substituting the asymptotic approximations (18) and (19) into (23), 
and standard manipulations. I 



Asymptotics of Although the asymptotic behaviors of the variance of Sn have been 

computed before, we re-derive them here by a different approach, which is easily amended for 
the calculation of other variances and covariances. 

Consider first 3 ^ m ^ 26. Then a < 3/2. Moreover, from Lemma 3.3, 

bf = 0(n2“-2) = 0(n^-"), 


for some 0 < £ < 0.00171. Consequently, by applying Proposition 3.1(i), 

V!®' ~ Csn, 

for some constant Cs', see [8] for a more explicit expression and the proof that Cs > 0. 
On other hand, if m ^ 27, since a > 3/2, we then have, by Lemmas 3.2 and 3.3, 


6 ? 


E 

2^k-i,k2^3 


r(Afcjr(A, 2 ) 


X 


E A-') (- 1 + E 

0^a:i,...,a:ms£ 1 \ / \ Isgrsgrr 


A)c„ —1 


X. 


E 


2^A:i,fc2sS3 


r(A,jr(Afc 2 ) 


m\T{Xk^) 


dx 

m!r(Afc 2 ) 


r(Afci+m-l) r(AA: 2 +m-l) 


Note that 


m!r(Afci + Xk2 - 1) m!(m - l)r(Afcjr(Afc2) \ 

r(Afc^ + Afcj + m — 2) r(Afc^ + Xk2 + m — 2) j 

m\T{XkA 


= L 


r(Afc. + m - 1) 

Applying Proposition 3.1(ii) term by term then gives 




I ^ 


2s£fci,fc2^3 


r(A.jr(Afc 2 ) 


(2^J^3). 


m!(m - l)r(Afcjr(Afc 2 ) 


r(Afc^ + Afc 2 + m — 2 ) — m!r(Afc^ + Xk2 — 1 ) 


=; Fi(/31ogn)n 


2«-2 
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where 


Fi{z) :=2 


1^2 


-1 + 


m\{m — l)|r(A 2 )p 


|r(A2)|2 V r(2a + m - 2) - m!r(2a - 1) 
^ 2 ^( 3 ^ ( 1 ' m!(m-l)r(A2)^ 

Vr(A2) 


-1 + 


r( 2 A 2 + m — 2) — m!r( 2 A 2 — 1) 


(24) 


Asymptotics of Vn^^\ We now turn to Vn^^\ If 3 ^ m ^ 13, then, by Lemma 3.3, 

where a < 1. Consequently, by Proposition 3.1(i), 

l/f~ Cgn, 

for some eonstant Cr. For the remaining range where m ^ 14, we have a > 1, and, by Lemma 
3.3 and (16), 




(m 


l)!Afcn^ 


2^k^3 


r(A^ 


jxi-\ - \-Xm = l 

m!r(Afc + 1) 


-1 + ^ Xi’‘ ^ j I 1 + 20 ^ Xr logo:,, j dx 




l^r^m 


- ^ . ra) - ,(V + 1 ) - (™ - Dd - .)} j. 

Now, we apply Proposition 3.1(ii) and again after some straightforward simplifieations 

~^2(/91ogn)n“, 

where 

(A2 + m — 1)^26*^ f 1 A-; 


F2iz) :=203fJ| 


(m —l)r(A2) y 20 A 2 + m — 1 
(m- 1)(1 -7)} 


{m 0 (A 2 + m) - 0 (A 2 + 1) 


(25) 


Asymptotics of Vn^\ In a similar manner, we obtain, by Lemma 3.3, 


~ (m — l)!n^ 
~ 402n2 (h^'^ 


fxiA - \-Xm = 


f 1 + 20 ^ Xilogxi j dx 

1 \ 1^/^m / 


7r^(m — 1) 


m + 1 6(m + 1) / ’ 

where the last line follows from applying (15), (16) and (17). Applying again Proposition 
3.1(ii) gives 

V'M ~ CKn\ 

whieh eompletes the proof of Theorem 2.1. I 
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3.4 Correlation between space requirement and NPL 

The calculations in this case are similar to those for p(S'„, K^), so we only sketch the major 
steps needed. Briefly, most asymptotic estimates differ either by a factor of the occupancy 
constant 0 or its powers. The only exception is the additional factor logn appearing in the 
covariance Cov(S'n, Nn) (see (2.3)). 

Let Un = E,{Nn). Then 


Ur 


= m 




1 . 


Consequently, by the asymptotic estimate (19) and by applying Proposition 3.1(i), we obtain 

Un = 2(1)^ n log n + C 2 n + o{n), (26) 


where, by Proposition 3.1, 


C 2 = (j)Ci + 20 





21^ A 
2 - A, J ’ 


(27) 


Cl being given in (21) and the A^’s defined in (20). Indeed, consider the difference : = 
Un — (pKn, which then satisfies the same recurrence (14) but with the toll function 


:= /^n — 1 — 0(^ — m + 1) = (pm 





n + Xi 
n 




and = 0 for 1 ^ n ^ m — 2. Then by applying Proposition 3.1, we obtain 


C 2 — Ci<p 


24 . 

j^m—1 


Vj 

(j + l)(j + 2)' 


Since = —<p(n — m + 1) for 1 ^ n ^ m — 2 and rjo = <p(m — 1) — 1, we then derive (27) 
by the relation 

In particular, C 2 — (pci equals 

12 222 44670 710 8990170 86959460 8225243460 9368632980 

125’ 2197’ 456533’ 7569’ 99806103’ 1001561769’ 97908438529’ 114862129381’ 

for m = 3,..., 10, and 


13941168359580 15364018080180 36778736979244260 39706104830251860 42542306175669300 

175531341607271’ 198165483844901 ’ 484907780151231137’ 534148059351752117’ 583013664848115773’ 
362341148683714200 60809828396490973800 220781849887636437400 1589879045909940738152200 

5051607560589134719 ’ 861420713064800471777 ’ 3174476111482140491583 ’ 23180880112213178399314917 ’ 
66535629228892650939112 69399644946307963559272 72191400913204902200872 

982905224931956375768865 ’ 1037954891250806970920625 ’ 1092384284013327674677545 ’ 

911488027263952226045421464 943834826916499599456679304 3048229719576792424490262245800 

13945777153309079949132939375 ’ 14593082411910111966602252205 ’ 47603282606571951420821994029889 ’ 
3144754504512378111611222765800 787117453959995151898324789769400 809570585901011449194661971389400 

49580602253255626178697360169689 ’ 12523181563980976087610969389067627’ 12992983079952314295925927936613927 ’ 
20280854972612671613961769087339836600 20806237502125190663861808383733444600 

328217277361176269245342166728792498003 ’ 339424705221771320114642916145949390923 
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for m = 11,..., 30. 

Let Nn = Nn — Vn- Then the moment-generating function Pn{u,v) := 
satisfies for n ^ m — 1 

Pn{u, v) = Y1 {u + v,v)--- Pj^ {u + V, 

Vm—1/ j 

with the initial conditions Pn{u,v) = 1 for 0 ^ n ^ m — 2 and 


Now define 

Then 

where 


~^n + ^ iPji + ^ji) ■ 




K!™1 := Cov(S„, N„) and Kl"' := Y(N„). 


V?'=™ E ’"-.Af + ijf', (X€{SN,N]), 

0 ^/^n—m +1 






L-.) t 


( w 

Vm—1/ j 


6 L"I = 


^ETf’ + 2v^r+^j 

Vm— 1/ j 

= l/M + 2l/f »1 + ' (i| - 2Ajij + A?) , 

Vm—1/ j 


As in the case of KPL, the following uniform estimate is crucial in our analysis. 
Lemma 3.4. Uniformly in j, 


5j = (/.n ( 1 + 20 ^ I log^ ) + o{n) 


1 ^/^m 


Proof. By the definition of (5j and the estimates (19) and (26). I 

Note that the expansion differs from that for Vj in Lemma 3.3 by an additional factor 0. 
If 3 ^ m ^ 13, then, by Lemmas 3.3 and 3.4, 

6|f^] = Csn + 0 , 

for a sufficiently small e > 0. Thus, by Proposition 3.1 (i), 

^[5iv] _ Csnlogn 
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Assume now m ^ 14. Then, again from Lemma 3.3 and Lemma 3.4 together with the known 

\S] 

asymptoties of Kt , we see that 


\m—lJ j Vm—1/ j 

Thus we deduee, as in the proof for , 

^ ~ 0F2(/31ogn)n“. 

Similarly, we have 



Consequently, 

This eompletes the proof of Theorem 2.3. I 


4 Bivariate distributional asymptotics for space requirement 
and path lengths 

In this seetion, we identify the asymptotie joint distributional behaviors of the pairs {Nn, S'„) 
and {Kn, Sn)- Although the sequenees (Nn) and (Kn) eonverge after normalization for all m ^ 
3 with limit distributions depending on m, we split the analysis into two eases depending on 
3 ^ m ^ 26 or m > 26 due to the phase ehange in the limit behavior of S'„. We diseuss the pair 
{Nn, Sn) in detail in Seetions 4.1 and 4.2. (the eorresponding analysis for (iT„, Sn) is similar 
and we will not give details). Moreover, in Seetion 4.3, we will show that the univariate limit 
random variables of the normalized sequenees {Nn) and {Kn) do have the same distribution. 
We introduee the following notation 

fi{n) := lin = IE[5'n] = 4>{n + 1) + + o(l V n°‘~^), (28) 

where 6 := 2A2 /r(A2); see (19). Similarly, write K{n) = Kn = IE(iT„) and iy{n) = z/„ = 
E(7V„). 

4.1 Node path length and space requirement. I. m ^ 27 

We give in this seetion the preeise formulation of the periodie ease m ^ 27 of Theorem 2.7. 

Normalization. We first normalize the veetor Qn = {Nn, Sn) as follows. Let Yq := 0 and 

Nn - E[iV„] Sn-(j)n 
n ’ n°‘~^ 




18 








Then the recurrence (12) implies forn ^ m — 1 

4"'(y,M)’+ (-M, 


(29) 




where 


aH — 

.- 




n n 

ri ^)' 


a—l 


n 


/l 

, M") := 

V 


(z.(/W)+ 0 /W)-z/(n) 




, m — 1 


n' 


a—1 


\ 

/ 


with assumptions on independence and on identical distributions as in Section 3.1. The expan¬ 
sion (26) implies 

j{n) j{n) 

Jr _ Jr 


n 




Moreover, by (11), we obtain the L 2 -convergence 




n 


{Vi,...,Vm) =: K 


(30) 


This implies the L 2 -convergences 
1 


n 


Y, +04"^) -K^) ^0 + 202 ^ Vr\ogVr=-.hN, (31) 


ICr-Cm 


and 


^ ( 0 ^ 1 , 24 (-) ^ 


v; 0 

0 


(32) 


For our limit result for m ^ 27, we first define a distribution which governs the asymptotics. 


The limiting map. To describe the asymptotic behavior of Qn, we use the following prob¬ 
ability distribution on the space M x C. Let denote the space of all distributions 

C{Z, W) on M X C and the subspace of distributions with finite second moment, i.e., 

\\{Z,W )\\2 := (E[Z 2 ] +E[| 1 L| 2 ])V 2 < ^o. Fory = ( 71 , 72 ) e M x C, let 


Af®xC(^) ;= |£(Z,1L) e 


E[Z]=7i,E[iy]=72}. 


We define the following map T/v on Alf ^ 


Tn : -> Al®xc 


£(Z, W)^ c 



Vr 0 
0 


/ \ 

J + 



(33) 
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where {Z^^\ ..., V are independent, {Z^'^\ W^'^'>) is distributed as {Z, W) 

for all r = 1,..., m and ^at is defined in (31). The || ■ || 2 -norni induees the minimal L 2 -metrie 
£2 by 

t2(p, v) := inf{||X -Y\y: C{X) = C(Y) = v], {f,,v € '). 

Given random variables X, Y, write for simplieity i 2 {X, Y) = (' 2 ('C(X), C{Y)). For any dis¬ 
tributions /i, G Xl®there exist optimal (' 2 -couplings, i.e. random veetors Ti, T 2 in M x C 
with ( 2 (Ai, z/) = ||Ti - T 2 || 2 . 

Lemma 4.1. Assume m ^ 27. For any 7 G M x C, the restriction of the map defined in 
(33) to Xl 2 ^^( 7 ) is a (strict) contraction with respect to £ 2 , and has a unique fixed point in 

Proof. Let 7 G M x C be arbitrary. For p G Xl 2 ^‘’"( 7 ), let T be a random variable with 
distribution T(p). First, note that ||T ||2 < 00 by independenee and ||&Ar ||2 < C )0 (we even 
have llfeivlloo < cxd). To see that E[T] = 7 , note that E[ 6 Ar] = 0 and ^ almost 

surely. Henee, we only need to show that E[l//'^“^] = 1/m. Sinee V) has density x 1-7 
(m — 1)(1 — for X G [0,1], we see that 

E = [\m - 1)(1 - dx = (m - 1 ) " l)r(A 2 ) ^ ^ 

L ^ io ^r(m + A 2 -l) m’ 

beeause r(m + A 2 — l)/r(A 2 ) = m!. This implies that E[T] = 7 , and thus T(/i) G Xl 2 ^'^( 7 ). 
This in turn implies that the restrietion of T to Xl®^^( 7 ) maps into Xl®^‘*'( 7 ). 

That the restrietion of T to Xl®^^( 7 ) is a eontraetion with respeet to £2 follows from a 
standard ealeulation, e.g., with a slight modifieation as in [36, Lemma 3.1]. I 


Proof of Theorem 2.7: NPL. Denote by C(X, A) the unique fixed point of the restrietion of 
Tjv to X1®^^((0, 6)), with 6 defined in (28). By Lemma 4.1, the distribution £(X, A) as in the 
statement of the Theorem is well-defined. The fixed point property of (X, A) implies that 


X 

3fJ(n*^A) 







(34) 


where (Vi,..., Kn), (X(^\ A*^^)),..., (X*^"*^ A^™')) are independent, and (X*^''\ A*^’’)) are iden- 
tieally distributed as (X, A). 

Define now three matriees 


aW — 

Z\j. .— 

rW 

Ir 

n 

0 

{‘T 

5 

' Vr O' 

0 

— 

, 0 ^ 

1 - 

1 

0 



V ^ / J 






and write 


A(n) :=f2(X,^,(X,gf^(X^A))). 
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To bound A(n), we use the following eoupling between the appearing in the reeurrenee 
(29) and the quantities appearing on the right-hand side of (34). Note that for any pair of 
distributions on there always exists an optimal £ 2 -coupling. We first fix the random vectors 
..., (X^™'\ Then, for each j ^ 1 and r = 1,..., m, we choose as an 

optimal (' 2 -coupling to 3fJ(j*^A*^^^)) on This can be done such that the sequences 

(Yj^\ (X(^), 3fJ(//5A(i)))) , ■ ■ •, (y}'"\ (X(”^), 3fJ(j'^A(”^)))) 

are independent and independent of (/^"\ Vi,, Vm)- Note that these couplings and indepen¬ 
dence assumptions do not violate equations (29) and (34). Hence, we obtain 


A(n) ^ 



3fJ 





2 


Using the triangle inequality and writing the components as Yn = (U„,i, ^n, 2 ), we obtain 


A(n) ^ 


1-Cr^m 


E (4”'(kS)‘-s(c;">(^T)’ 


E 




(n) 


xfi) 

Afi) 


3? I 


XM 

f^ir) 


E 

l^r^m 




_vfi) 

n 


+ 


5 U) _ 


The second and the fourth summand on the right-hand side tend to zero as n — )■ cxd by (30) and 
(32). For the third summand, note that the asymptotic behavior of the normalized size E „,2 of 
m-ary search trees is covered by Theorem 1, eq. (2) in [ 8 ]. In particular, from that theorem we 
obtain sup„^]^ l|En, 2||2 < C) 0 . Taking into account the prefactor {Ir ^))o 

and conditioning 

on we find that the third summand also tends to zero. 

To bound the first summand in the latter display, we write, for r = 1,..., m and n ^ m — 1, 


Wj:^^ := 





and denote the components of Wr^ by Wr'^’ = ^r^ 2 )- For r = 1 ,..., m, we have 


r(n) 


U) 


E 


E Y 


n) 


l^r^m 


= E 


E {(»(?)' + (<?)■'}+E + w'S'w'S’} 

r^s 


(35) 


We bound the three types of terms individually. First, for the dominant term 


E 


(lAiT + (»«') 


(A\2 


= E 





n 


2(a-l) 



,2 


gfj((jW)*/3AM)j" 


21 



































^ E 


r(n)\2(“-i) 


n 


((fS- .YM)" + (fM _ 3} ((/(»))i/SAW) 


where we used the inequality (/r"V'^)^ ^ Conditioning on and using that 

and (X^^\ are optimal couplings, we obtain 


E 




^ E 




n 


A2(j(n)) 


For the cross-product terms in (35), assume 1 ^ r, s ^ m with r ^ s. Note that, by indepen¬ 
dence, we have = 0 conditioning on Ir"^^ and ji"\ From the expansion (28), we 

obtain 


E [Yn] = 


0 


3fi(0n*^) + R{n) I ’ 


with a remainder R{n) = o(l). By independence and E[A] = 6, we obtain E[PF^^ 2 ^] = 

and 


E[iyg)iyiJ] = E 


j{n) j{n)' 

Ir Is 

n n 


a —1 




which tends to 0 by the dominated convergence theorem as i?(Jr"^^), -R(ji”^) —)• 0 in probability. 
Hence, collecting all estimates, we obtain 


A(?7,) ^ j E 






n 


a 2 (jW) 


1/2 

+ o(l)| +o(1). 


(36) 


Now A(n) —)■ 0 follows from a standard argument since we have 


lim E 

r). —^r>n f ^ 






n 


E 2a - 1) < 1; 


l^r^m 


(cf. the proof of Theorem 4.1 in [36]). This proves Theorem 2.7 for NPL. 


4.2 Node path length and space requirement. II. 3 ^ m ^ 26 

We begin with the recurrence (12), and recall that, for 3 ^ m ^ 26, 

V(^„)~C'sn, V(A„)~C'^n2 with Cm = ^^Ck, 
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see (18) and (26). There exists an rii ^ 1, such that for all n ^ rii, the matrix Cov(Q„) is 
positive definite. We normalize it by Qn := Qn for 0 ^ n < ni and by 



(-y/C^n) ^ 0 

0 (C'5n)-V2 


{Qn-E[Qn])\ 


[n ^ rii). 


Then, by (12), satisfies the recurrence 

(Q„)‘= CW(qM,)‘+ 5„, (njm-l), 

where (denoting by Fn,r the event Fn^r ■= ^ ni} and its complement) 



with assumptions on independence and identical distributions as in (12). Note that the asymp¬ 
totic expressions for the variances and covariance between Nn and Sn imply that 

Cov(Q„) = Id 2 + o(l), 


where Id 2 denotes the 2x2 identity matrix and the o(l)-term means that all four components of 
Cov(Q„) converge to the corresponding components of Id 2 , each o(l) in the four components 
being different in general. In particular, Cov(Q„) is a symmetric, positive definite matrix for 
all n ^ ni. Let := Id 2 for 0 ^ n < ni and Rn := (Cov(Qn))^'^^ for n ^ rii. Note that, by 
continuity, we have 

Rn — Icl2 + o(l), Rj^^ = Id2 + o(l). (38) 


Now normalize by := R^^^Qn, for n ^ 1, so that Cov(F„) = Id 2 for n ^ ni, and 

(F„)*= (n^m), (39) 


where = R R (n) and = F with assumptions on independence and identi- 
cal distributions as in (12). From (37), (38) and (30), we then obtain the convergences 


f(") _y 

r 


Vr 0 
0 


=: F* 


b^n) 


-)■ 


c 


-l/ 2 > 


N 


m 


=: h 


Ny 


(40) 


which hold in Lp for any 1 ^ p < oo (we will need p = 3 below). 
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The limiting map. To describe the asymptotic behavior of Qn, we use the following proba¬ 
bility distribution on the space In accordance with the notation in [39], we denote by 
the space of all probability distributions on by the subspace of all £(Z) G with 
IIZII 3 < oo, and furthermore 


Al3(0,Id2) 


Ml 


E[Z] 


0,Coy{Z) = Ida}. 


Define the map on 




C{Z) ^ c 



(41) 


where Z^^\ ..., Z^^\ (Fj",..., 6^) are independent and Z^^'> is distributed as Z for all 

r = 1,..., m. Here F* and b*j^ are defined in (40). 

Lemma 4.2. The restriction ofT'j^ in (41) to ^( 3 ( 0 , Ida) has a unique fixed point £(X', A') 
which is a product measure, i.e., its components X' and A' are independent. 

Proof. We check first that the restriction of Tf to Mlifi, Ida) maps into Mlifi, Ida): 

• For any p G Ali(0, Ida), we see, by independence and ||&Ar ||3 < cxd, that Tf{p) G Ml. 

• For the mean of T'^{p), we have, from E[bN] = 0, that T'^{p) is centered. 

• For the covariance of Tf(p), we obtain (see also [39, Lemma 3.2]) the matrix 


E 


bl/CN 0 
0 0 


+ mE 


Vf 0 
0 Vi 


= Ida. 


(42) 


Thus Tf{p) G ^[ 3 ( 0 ,Ida). By Lemma 3.3 in [39], the existence of a unique fixed point 
C{X', A') follows from the inequality 

mE||F*||Jp = < 1. 

Alternatively, Theorem 5.1 in [11] (or Lemma 3.1 in [39] as well) implies the existence of a 
unique fixed point C(X', A') in Mlf), Ida). 

To show that C(X', A') is a product measure we recall that the existence of the unique fixed 
point that we just obtained is based on the fact that the restriction of Tf to Mlft, Ida) is a 
contraction with respect to a complete metric on A1|(0, Ida). We do not introduce this metric, 
the Zolotarev metric Ca, here, since we do not require the special description of C 3 . For more 
information on C 3 , in particular the completeness of the metric space (A1|(0, Ida), Cs)^ see [11]. 
We denote the space of probability measures on M by A1 and 


-^3(0,1) 


G M 


E[|Z|3] < cx),E[Z] 


0,V(Z) = l}. 


Furthermore, the product of probability measures ui and z/a on M by ® z^a. Consider the space 

Q '■= {j^i 0 A/'(0,1) I z^i G MfiQ, 1)}. 
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Then Q c Id 2 ). 

To show that (GXs) is a closed subspace of (A^3(0, Id 2 ), Cs), let (/i„ 0 A/'(0,l))„^i be 
a sequence in Q that converges in (A^|(0, Id 2 ), Cs)^ say to £(^ 1 ,^ 2 )- Since Cs-convergence 
implies weak convergence, we first obtain that Y 2 is standard normally distributed. Clearly, we 
have C{Yi) G A^3(0,1). Since a weak limit of product measures is a product measure (see 
e.g. [2, Theorem 2.8(ii)]), C{Yi, Y 2 ) is a product measure. Now {Q, ( 3 ) as a closed subspace of 
the complete space (A^ 3 ( 0 , ld 2 ), Cs) is complete. 

We next show that the restriction of to G maps to G- Note that only here do we 
use the fact that the second component in the definition of ^ is a normal distribution; see 
(43) below. For /i = /xi ® A/^(0,1) G G, the covariance matrix of =; C{Yi,Y 2 ) 

is ld 2 by (42). Since Y 2 is distributed as where the N/s are independent 

normals and independent of {Vi ,..., 14i), we see that £( 12 ) = A/'(0,1). Thus it remains 
to show that, for T'^{jJt) G G, the components Yi and Y 2 are independent. Let A,B C 
M be measurable and (l/^\ 5 • • • i be independent random vectors that are 

independent of (Li,..., Kn) and identically distributed as /i. Then, denoting the distribu- 
tion of L = (Vi,..., Vra) by T and, for v = (ni,..., Vm), writing tN{v) := Cj^ + 

log^r-), WC haVC 


P(Fi e A, Y 2 e B) 



p(yi G 2i)P(y2 e B). 


We then deduce that T^(/i) G G and maps ^ to 

Finally, Banach’s fixed point theorem implies that the restriction of T)(r to G has a unique 
fixed point. Since G C A^|( 0 ,ld 2 ), we find £(X',A') G G- Consequently, X' and A' are 
independent. I 


Proof of Theorem 2.6: NPL. The proof of Theorem 2.6 relies on Theorem 4.1 in [39]. The 
parameter d there is taken to be the dimension d = 2 here, and we choose the parameter 
s = 3. Note that the normalization in (10) is as required in [39, eq. (22)] and is identical to 
the normalization leading to the Y^ in (39). We need to check the conditions (24)-(26) in [39]. 
Condition (24) in our case is, with and 6 *^"^ as in (40), 

..., ^ [Fl..., F:M 

in £ 3 . This is satisfied by (40). Condition (25) in our case is also satisfied because 

l^r^m 
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Finally, condition (25) is, for all r = 1,..., m and all £ G N, 


E 


^llop 


^ 0 . 


Since ||Fj"^||op are uniformly bounded random variables, this condition is equivalent to 

p (4”) ^ ^ 0, 


whieh is satisfied in view of (30). Henee, Theorem 4.1 in [39] applies and implies the eon- 
vergence Cov{Qn)~'^^‘^{Qn - HQn]) in the metric ( 3 , which implies the stated 

convergence in distribution. 

Note that the components of imply univariate recursive distributional equations for 
C{A') and£(X'): 

A'= VVrA'^"\ 

X' = VrX'^^'> +C-^^, 

l^r^m 


with eonditions on independence and identical distributions corresponding to the definition 
of T)(r. Moreover, both equations are subjeet to the eonstraints of zero mean, unit varianee 
and bounded third absolute moment. The solution for £(A') is given by the standard normal 
distribution, and a comparison of the equation for C{X') with (33) shows that X' is identically 
distributed as with X as in Theorem 2.7. 


4.3 Limit law for NPL 

From the previous two subsections, we see that the limit law of {Nn — E(A^„))/n is the unique 
solution, subject to zero mean and finite varianee, of the recursive distributional equation 

X = +0 + 202 VrlogVr, 

where X^^\ ^ X^'^\V are independent and the have the same distribution as X. 

Moreover, it is well-known (see Corollary 5.2 in [38]) that the limit law of (X„—E(X„))/?7,, 
whieh we denote by C{K) in Section 2.2, is the unique solution, again subjeet to zero mean 
and finite varianee, of 

X= V;x('') + l + 20 Y K-logK, (44) 

where the meaning of the notations is as above. 

Comparing these two distributional recurrences, we see that the solution to the first one is 
C{(j)K). Thus, we have 

X„ — E(X„) d 
n 

i.e., the limit law of and X„ are up to a constant identical. In fact, if one is only interested in 
this result, then one does not need the analysis in the last two subsections but there are simpler 
approaehes, as we discussed below. 
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4.4 Short proofs for the limit law of 

In this section, we discuss different means of proving directly the limit law for NPL without 
the detour via the bivariate setting from Sections 4.1 and 4.2. 

Limit law for NPL by the contraction method. A first alternative approach to the limit 
law for NPL uses the contraction method and “over-normalizing” in recurrence (12). More 
precisely, normalize with an a < a' < Iby 



Now the recurrence (12) leads to the limit equation 



(45) 


with conditions on independence and identical distributions as in (33). Theorem 4.1 in [36] 
directly applies and implies that TZ^ TZ in distribution and with second (mixed) moments, 
where TZ is the unique fixed point subject to zero mean and finite second moment of the re¬ 
cursive distributional equation (45). By substituting into (45), we see that {4>K, 0) has the 
distribution of 7Z, which implies that 


Nn — IE[A'n] d 


n 


Univariate limit law for NPL via Slutsky’s theorem. Another approach is to apply Slutsky’s 
theorem. For that purpose, we consider the moment generating function 



Then satisfies the recurrence 



with the initial conditions Pn{u,v,w) = 1 for 0 ^ n ^ m — 2. Now define 

;=Cov(P„,iV0. 


Then 


= m 



where 
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Observe that Lemma 3.3 and Lemma 3.3, together with the asymptoties of Vn^\ imply that 

1 








/ n \ / j J J / n \ / 

Vm—1/ j \m—lJ j 

Consequently, by the same method of proofs used in Section 3, we see that 


Now consider the difference 

E{(PK^ - 

~ — 2(j)^C + (jy^CKn^ 

= 0(77,^). 

Consequently, by Chebyshev’s inequality, we obtain the convergence in probability 

Nji P ^ Q 

n 

From this, the claimed result follows from Slutsky’s theorem and the limit law for KPL. 

Note that this argument in addition gives the following consequence. 

Corollary 4.3. The correlation coefficient between Kn and Nn tends asymptotically to one 

P{^ni Nn) —)■ 1 . 

Identical limit random variables. To the pair {Nn, Kn),'^e could as well apply the contrac¬ 
tion method, and prove that the normalization {Nn — E{Nn))/n, {Kn — E(iC„))/n) converges 
to a limit given by 


(p)‘= E 




K 0 

0 Vr 


(r(r)y + 


(pbx 

bx 


with conditions on independence and identical distributions as in (33) and subject to zero mean 
and finite second moment. By plugging in, we find that {(pK, K) has the limit distribution. 
This re-derives Corollary 4.3 and shows that the limit random variables (up to scaling) are even 
almost surely identical. It seems reasonable to conjecture that the sequences 

f Kn-E[Kn] 

V (pn V n 

both convergence almost surely to the same random variable with the distribution of K. This 
requires the m-ary search trees to grow as a combinatorial Markov chain, which canonically is 
obtained by building up the tree from i.i.d. uniformly on [0,1] distributed data. For the notion 
of a combinatorial Markov chain and related results on binary search trees, see Griibel [19]. 
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5 Extensions 


The dependence and phase changes we established above for space requirement and path 
lengths in random m-ary search trees are not confined to these shape parameters, neither are 
they specific to m-ary search trees. The same study (including the same methods of proof) can 
be carried out for other shape parameters and other classes of random trees. We consider first 
random median-of-(2f + 1) search trees in this section, where we discuss the joint asymptotics 
of size (defined as the number of nodes with at least 2 t descendants) and total key path length 
(which is also the major cost measure for Quicksort using the median-of-(2f + 1) technique). 
Random quadtrees will be also briefly discussed. Then we consider another line of extension, 
namely, to other shape parameters in these trees. Since the technicalities follow more or less 
the same pattern, we skip all proofs. 

5.1 Random fringe-balanced binary search trees 

Fringe-balanced binary search trees (FBBSTs) are binary search trees (m = 2) with local 
re-organizations for all subtrees of size exactly 2t + 1 into more balanced ones. In terms of 
quicksort, the corresponding tree structures choose at each partitioning stage the median of 
a sample of 2f -f 1 elements to partition the elements into smaller and larger groups. For a 
precise description and other connections, see [8, 10]. The number of nodes Sn with at least 
2 t descendants (or the number of median-partitioning stages) and the total path length of these 
nodes (TPL; KPL=NPL for binary search trees) of a random FBBST constructed from 
a random permutation of n elements satisfy the following distributional recurrence (Qn ■ = 
{Xn, Sn)) 



2 t + 1), 


with conditions on independence and identical distributions as in (12) and the initial conditions 
5o = ■ ■ ■ = S 2 t = To = • ■ ■ = ^21 = 0. Here 


Wn = J) 







We start with the mean. First, for Sn, it was proved in [8] that 


]E(5n,) — Clin -f 1) — 1 -f ^ ^ 


2s£fc^3 


Cu 

r(^fc) 


n' 


+ o(n"*-^) 


(46) 


where 


Ck 


_ t\ _ 

2(^?fc ~ l)^fc ■ ■ ■ iyQk + f — 1) j+Q^ 


{k — 1,..., f -f 1), 


with = 2 > 3fJ(^2) 

equation 


3fJ(^3) = at > 3f^(^?4) ^ ^ 3fJ(ft+i) being the zeros of the indicial 


i^z 1) ■ ■ ■ i^z 2t) 


2 {2t + l)\ 
t\ 
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In particular, 


1 


2{t + l){H2t+2 — Ht+i) 

Moreover, using the transfer theorems from [ 8 ], we obtain, for the mean of A’„, 

E{Xn) = — -— n log n + Qn + o{n ), 

n 2 t +2 — nt+i 

for some eonstant Ct. The same method of proofs (asymptotie transfer and the approaeh used 
in Seetion 3.3) also leads to asymptotic estimates for the variances and the eovariance between 
Xn and Sn. 

Theorem 5.1. The variance of the number of non-leaf nodes Sn and that of the TPL Xn in a 
random FBBST, and their covariance satisfy 




Dsn, 

Gi(Alogn)n^"*"^ 


ifl^t^ 58; 
ift ^ 59, 


Coy {Sn, Xn) 

Y{Xn) 


rsj 


rsj 


I DrU, 

[G '2 (Alogn) 
Dxn^, 


ifl^t^ 28; 
ift ^ 29, 


where Ds, Dr are suitable constants, (3t = '^{q 2 ), ttnd all other constants and functions are 
given below. 


The periodic functions in the above theorem are given by 


Gi{z) 


\C 2 

|r(^2)l 


+ 23fJ 


, 2(2t + l)!|r(g2 + t)P \ 

^ t\‘^V{2at + 2t) - 2t\{2t + l)!r(2at + t-l)) 

( CW" „_ 2(2t + l)!rte + ir _ 

vr(e2)H «!"r(2e2 + 2«)-2«!(2« + i)!r(2ft + «-i) 


and 


G 2 {z) = ^ 


G 2 D^ / g 2 + 2 t + 1 
r(^ 2 ) Y t + 1 


{g 2 + 2t + l)'ip{g 2 + 2t + 2) — {g 2 + t)'ip{g 2 + f + 1 ) — (f + — 7 ) 

{t + l){H2t+2 — Ht+i) 


respeetively. Moreover, we have 


Dx = 


{H2t+2 - H, 


t+1 ) 


2t + 3 ( 2 ) 

t + 1 


^ + ^ H-( 2 ) 

t + 1 


The limit law for the normalized TPL of random FBBSTs was first shown in the dissertation of 
Bruhn, [5]; see also [4, 8 , 34, 40]. The phase ehange of the limit law of the normalized Sn was 
first discovered in [ 8 ]. 
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To describe the joint limiting behavior of Sn and we denote by V a random variable that 
is the median of ( 2 f + 1 ) independent, identically distributed uniform [ 0 , 1 ] random variables, 
i.e., a Beta(f + 1, f + 1) distribution. We define the map Tmed by 


Tmed : ^ Ad® 


C{Z, W)^C 


V 0 
0 


ri-v 
J + [ 0 


0 

(1 - 


( Z( 2 ) \ 

l^iy( 2 ) j + 





with conditions on independence and distributions as in (33) and 

:= 1 + T 7 -(V log V + (1 - V) log(l - V)) . 

-n2t+2 — J^t+l 

Then Lemma 4.1 and its proof also apply to the map Tmed as long as f ^ 59. The normalization 
used is given by 

Sr,-Cin\ 

Vn (- - -. ), (" > 1). (47) 

We have the following asymptotic behavior for t ^ 59. Rewrite (46) as 

E{Sn) = Ci{n + !)-! + 3fJ(^9n^2) + o(n“‘-®), (48) 


where -d := 23fJ(C2/r(^2))- 

Theorem 5.2. Assume t ^ 59. Let yn be the normalization ofTPL and the number of non-leaf 
nodes in a random FBBST defined in (47). Denote by £(Xmed, Amed) the unique fixed point of 
the restriction ofTj^ed to Ad®^‘*'((0, ?))) with lO defined in (48). Then, denoting by fit '■= A(^? 2 ), 
we have 


(-2 (A’n, (Amed, Amed))) 0, (n, —>■ CX)). 

For the range of 1 ^ ^ 58, we define := (Df^^'^hM-, 0)* and the map on Ad^: 

TU : ^ Ad^ 


C{Z, w)v^ c 


( 

'V 0 

f ZA) \ 

1 

0 

_1 

[ 

0 

Viyw ) + 

1 

0 

1 

< 

1_ 


■^med 1 ’ 


with conditions on independence and distributions as in (41). Again Lemma 4.2 and its proof 
apply to Tj(^gj and imply that the restriction of T^gj to Ad|(0, 162 ) has a unique fixed point 

Amed5 Amed)- 

Similar to the small m case of m-ary search trees, the remaining range 1 ^ ^ 58 also 

leads to a convergence in distribution. 

Theorem 5.3. Assume 1 ^ ^ 58. Let Qn = [Ain, Sn) be the vector ofTPL and the number 

of non-leaf nodes in a random FBBST. With A(^gj) as above, we have 


Cov(Q„)-‘7^ (S„ - E[Q„]) ^ C . 


where A(^g^ is a standard normal distribution. Moreover, and A(^g^ are independent. 
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5.2 Random quadtrees 

Point quadtrees, first proposed by Finkel and Bentley [15], are one of the most natural exten¬ 
sions of binary seareh trees to multivariate data in whieh eaeh point splits the d-dimensional 
spaee into 2^ subspaees, eorresponding to 2^ subtrees in the eorresponding tree strueture. For a 
preeise definition of random d-dimensional quadtrees; see [7, 30]. Sinee the spaee requirement 
is a eonstant, we diseuss the number of leaves and the internal path length in this seetion. 
Note that for the pair Wn := L^), we have, for all n ^ 2, 

E K’)‘+ 

with eonditions on independence and identical distributions as in ( 12 ), where the initial condi¬ 
tions areLo = 0,Li = l,So = Si = 0. Moreover, the underlying splitting probabilities are 
given by 

= 72^) = f 1 / gi(x)^i ■ ■ ■ g2<i(x)^2''dx, 

V7i5 • • • 572'*/ 

where ji,... ^ 0 , ji H-^ 72 ^* = ^ - l,x = (xi, ...,Xd) and 

g/j(x) = JJ {{I - hi)xi + hiil - xi)), 

l^l^d 

with ( 61 ,..., 6^)2 being the binary representation of h — 1 . 

First, it was proved in [7] that the mean of satisfies, for d ^ 2, 

E(L„) = XdU + + o(n“), (49) 

where Xd, c+, c_ (which is the conjugate of c+) are given in [7], and = a + 1 + i(3. 

Moreover, the asymptotic transfer results in [7] also lead to the asymptotic approximation (see 
also [16]) 

2 

E(H„) = ^ logn -f cn -f o{n), 

for some explicitly computable constant c. In a similar manner, we can characterize the asymp¬ 
totics of the variances and the covariance. 

Theorem 5.4. For the number of leaves Ln and the internal path length in random d- 
dimensional quadtrees, we have 

^ {pi(/31ogn)n2“, 

Cov(S„, L,^) ~ ^ 1 

\P2{P log 

V(H„) ~ Exn\ 

where El, Er are suitable constants, (3 := 2 sin and all other constants and functions are 
given below. 


ifl^d^S] 
ifd ^ 9, 

ifl^d^b; 
ifd ^ 6, 
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The periodic functions above are given by 


Pi{z) = 2 


{ 2 a + iy 


(2a + 1)'^ - 2 ^ 
+ 23fJ 


|c+|^Ci(Q; + i/3, d — i/3) 

(2a + 2ty + ir ^ 


{2a + 2i/3+ 1Y-2<^ 
where cl(u, v) = 1 — t](0, u) — r](0, v) + 2 '^r](u, v) with 


c+cl(q; + i/3, d + i/3)e^ 


7 ](u,v) : = 


+ 


and 


where 


Finally, 


P^iz) = 23fJ 


r(M + i)r(u + 1 ) 

u + v + l ' r(M + U + 2) 

(d + i/3 + 2)'^ 


(d + i/3 + 2Y - 2^ 


c+Ci^(d + i/3)e* 


2(^+1 0 

ck(u,v) = ri(0,u) + —r^v{u,v) 


Ex = 


d dv 
21 - 27r2 


D =1 


3-^ -2<i 9d ' 

The limit law for the normalized internal path length of random d-dimensional quadtrees was 
first obtained in [38]; see also [4, 7, 34]. The asymptotic behavior of the normalized number 
of leaves together with its phase change was first discovered in [7]; see also [9, 23, 24, 25] for 
closely related types of phase changes. 

We now describe the joint behavior of and L„. A random variable U uniformly dis¬ 
tributed over the unit hypercube [0,1]'^ decomposes this cube into 2 ‘^ quadrants by drawing the 
d hyperplanes through U perpendicular to the edges of the cube. Choose an ordering of these 
quadrants and denote their volumes by (f/)i,..., (f/jod; see [38, Section 2]. Now define the 
map Tq^ad by (with ^2 := 


Tquad : ^ M 


IxC 


C(Z,W)^c{ 


{U)r 0 

0 {U)y 


with conditions on independence and distributions as in (33), and 



6 q := 1 + - {U)rlog{U), 


lsSrsS 2 ‘^ 


Then Lemma 4.1 and its proof also apply to map Tquad as long as d ^ 9. The normalization 
used is given by 


V„, : = 


®^(‘^n) En 'X.dX^ 


n 




(n ^ 1). 


(50) 
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Rewrite (49) as 


(51) 


E(L„) = XdU + 3fJ(^n“+*^) + + o(n“), 

where = 2c+. 

Theorem 5.5. Assume d ^ 9. Let Vn denote the normalization of the internal path length 
and the number of leaves in a random d-dimensional quadtree defined in (50). Denote by 
-^(-^quad, Aquad) the unique fixed point of the restriction o/Tquad to ^‘*'((0, ?))) with d de¬ 
fined in (51 ). Then we have 

h (V., (Xqaad,3^(ri*^Aquad))) ^ 0. 

For the remaining range of 1 ^ ^ 8, we define := (Ef^^’^bM-, 0)* and the map 

onM^ 


T 

^ quad 




C(Z, W)^ c 



'{U)r 

0 


0 

{U)y\ 





with conditions on independence and distributions as in (41). Similarly, Lemma 4.2 and its 
proof again apply to Tqag^ and imply that the restriction of Tq^gd to Ad 3 ( 0 , Id 2 ) has a unique 
fixed point £(A:;,,gd, A^^g^). 

Theorem 5.6. Assume 1 ^ d ^ 8. Let = (S„, L„) denote the vector of internal path length 
and the number of leaves in a random d-dimensional quadtree. With £(Xqgg^, A'^^g^) as above, 
we have 


Cov(V„)-‘'"(V„ - E|V„]) ^ 

where A^^g^ is a standard normal distribution, and A'^gg^, and A'^^gj are independent. 

The case when d = 1 corresponds to binary search trees, or equivalently, to Hoare’s quick¬ 
sort, and the above theorem can be re-worded as follows. The number of comparisons and the 
number of partitioning stages used by Hoare ’s quicksort are asymptotically uncorrelated and 
independent. Note that our results in the previous section for random FBBSTs give indeed a 
stronger statement for the asymptotic independence or asymptotical periodicity for quicksort 
using median-of-(2f -f 1). 


5.3 More general shape parameters 


Our study can be extended to other shape parameters. For random m-ary search trees, the 
generality of Proposition 3.1 provides an effective means of widening our study to a broader 
class of “toll functions” in the definitions of Sn, and N^. For example, the following 
extensions are straightforward. 


Sn = + ■ ■ ■ + + 


c -f o(n ^), 

-.a— 1 \ 


oin 


if 2 ^ m ^ 13; 
if m ^ 14 


for some constant c, and 


(52) 
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< oo, 


(53) 


- Kn = + ■ ■ ■ + + n + tn with 


tn = o(n) and 




and 

- Nn = H-h + 5 }^^ H-h 5 }”^^ + tn, where the SnS satisfy (52) and tn satisfies 

(53). ' 


Because the same iff-condition (53) also appears in the recurrence relations arising from 
the two other classes of random trees (see [7, 8]), exactly the same conditions can be used to 
extend the consideration for FBBSTs and quadtrees. Details are omitted here. 
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